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GENES ASSOCIATED WITH DISEASES OF THE COLON 

TECHNICAL FIELD 

The invention relates to seven genes associated with diseases of the colon, particularly colon 
5 cancer, as identified by their coexpression with known colon cancer genes. The invention also relates to 
the use of these biomolecules in diagnosis, prognosis, prevention, treatment, and evaluation of therapies 
for diseases of the colon. 

BACKGROUND ART 

Colon cancer is the third leading cause of cancer deaths in the United States. Each year over 

10 1 00,000 new cases are diagnosed, and 50,000 patients die from the disease. In large part this death rate is 
due to the inability to diagnose the disease at an early stage (Wanebo (1993) Colorectal Cancer . Mosby, 
St Louis MO). Although some of the genes that participate in or regulate the growth of colon cells are 
known, many other genes remain to be identified. Identification of new genes with significant levels of 
expression in cells of the diseased colon will provide new diagnostics, opportunities for earlier patient 

15 diagnosis, and targets for the development of therapeutic agents. 

The present invention satisfies a need in the art by providing new compositions, seven genes 
associated with diseases of the colon identified by their coexpression patterns with genes expressed in 
colon cancer, that are useful for diagnosis, prognosis, treatment, prevention, and evaluation of therapies 
for diseases of the colon. 

20 SUMMARY OF THE INVENTION 

In one aspect, the invention provides for a substantially purified polynucleotide comprising a 
gene that is coexpressed with one or more known colon cancer genes in a plurality of biological samples. 
Preferably, known colon cancer genes are selected from the group consisting of carbonic anhydrase I, II, 
and IV (CA I, II, and IV), carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor- 

25 associated antigen (CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin 
(galec), glutathione peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin 
(cadher), and intestinal mucin (muc-2). Preferred embodiments include: (a) a polynucleotide sequence 
selected from SEQ ID NOs: 1-7; (b) a polynucleotide sequence which encodes the polypeptide of SEQ 
ID NOs:8 or 9; (c) a polynucleotide sequence having at least 75% identity to the polynucleotide 

30 sequence of (a) or (b); (d) a polynucleotide sequence which is complementary to the polynucleotide 

sequence of (a), (b), or (c); (e) a polynucleotide sequence comprising at least 10, preferably at least 1 8, 
sequential nucleotides of the polynucleotide sequence of (a), (b), (c), or (d); or (0 a polynucleotide 
which hybridizes under stringent conditions to the polynucleotide of (a), (b), (c), (d) or (e). Furthermore, 
the invention provides an expression vector comprising any of the polynucleotides described above and 

35 host cells comprising the expression vector. Still further, the invention provides a method for treating or 
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preventing a disease or condition associated with the altered expression of a gene that is coexpressed with 
one or more known colon cancer genes comprising administering to a subject in need a polynucleotide 
described above in an amount effective for treating or preventing the disease. 

In a second aspect, the invention provides a substantially purified polypeptide comprising the 
gene product of a gene that is coexpressed with one or more known colon cancer genes in a plurality of 
biological samples. The known colon cancer gene may be selected from the group consisting of carbonic 
anhydrase I, II, and IV, carcinoembryonic antigen family of proteins, colorectal carcinoma tumor- 
associated antigen, down-regulated in adenoma, fatty-acid binding protein , galectin, glutathione 
peroxidase, guanylin, cytokeratin 8 and 20, cadherin, and intestinal mucin. Preferred embodiments are 
(a) the polypeptide sequence of SEQ ID NOs:8 and 9; (b) a polypeptide sequence having at least 85% 
identity to the polypeptide sequence of (a); and (c) a polypeptide sequence comprising at least 6 
sequential amino acids of the polypeptide sequence of (a) or (b). Additionally, the invention provides 
antibodies that bind specifically to any of the above described polypeptides and a method for treating or 
preventing a disease or condition associated with the altered expression of a gene that is coexpressed with 
15 one or more known colon cancer genes comprising administering to a subject in need such an antibody in 
an amount effective for treating or preventing the disease. 

In another aspect, the invention provides a pharmaceutical composition comprising the 
polynucleotide of claim 2 or the polypeptide of claim 3 in conjunction with a suitable pharmaceutical 
carrier and a method for treating or preventing a disease or condition associated with the altered 
20 expression of a gene that is coexpressed with one or more known colon cancer genes comprising 

administering to a subject in need such a composition in an amount effective for treating or preventing the 
disease. 

In a further aspect, the invention provides a method for diagnosing a disease or condition 
associated with the altered expression of a gene that is coexpressed with one or more known colon cancer 

25 genes, wherein each known colon cancer gene is selected from the group consisting of carbonic 

anhydrase I, II, and IV, carcinoembryonic antigen family of proteins, colorectal carcinoma tumor- 
associated antigen, down-regulated in adenoma, fatty-acid binding protein, galectin, glutathione 
peroxidase, guanylin, cytokeratin 8 and 20, cadherin, and intestinal mucin. The method comprises the 
steps of (a) providing a sample comprising one of more of the coexpressed genes; (b) hybridizing the 

30 polynucleotide of claim 2 to the coexpressed genes under conditions effective to form one or more 

hybridization complexes; (c) detecting the hybridization complexes; and (d) comparing the levels of the 
hybridization complexes with the level of hybridization complexes in a nondiseased sample, wherein 
altered levels of one or more of the hybridization complexes in a diseased sample compared with the level 
of hybridization complexes in a non-diseased sample correlates with the presence of the disease or 
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condition. 

Additionally, the invention provides antibodies, antibody fragments, and immunoconjugates that 
exhibit specificity to any of the above described polypeptides and methods for treating or preventing 
diseases or conditions of the colon. 
5 BRIEF DESCRIPTION OF THE SEQUENCE LISTING 

The Sequence Listing provides exemplary colon cancer gene sequences including polynucleotide 
sequences, SEQ ID NOs:l-7, and the polypeptide sequences, SEQ ID NOs:8 and 9. Each sequence is 
identified by a sequence identification number (SEQ ID NO) and by the Incyte clone number with which 
the sequence was first identified. 
10 DESCRIPTION OF THE INVENTION 

It must be noted that as used herein and in the appended claims, the singular forms "a", "an", and 
"the" include the plural reference unless the context clearly dictates otherwise. Thus, for example, a 
reference to "a host cell" includes a plurality of such host cells, and a reference to "an antibody" is a 
reference to one or more antibodies and equivalents thereof known to those skilled in the art, and so forth. 

15 

DEFINITIONS 

"NSEQ" refers generally to a polynucleotide sequence of the present invention, including SEQ ID 
NOs:l-7. "PSEQ" refers generally to a polypeptide sequence of the present invention, SEQ ID NOs:8~ 
and 9. ~ 

20 A "fragment" refers to a nucleic acid sequence that is preferably at least 20 nucleic acids in 

length, more preferably 40 nucleic acids, and most preferably 60 nucleic acids in length, and 
encompasses, for example, fragments consisting of nucleic acids 1-50, 51-400, 401-4000, 4001-12,000, 
and the like, of SEQ ID NOs:l-7. 

"Gene"refers to the partial or complete coding sequence of a gene and to its 5* or 3* untranslated 

25 regions. The gene may be in a sense or antisense (complementary) orientation. 

- "Colon cancer gene" refers to a gene whose expression pattern is similar to that of known colon 
cancer genes which are useful in the diagnosis, treatment, prognosis, or prevention of diseases of the 
colon, particularly colon cancer and other diseases associated with abnormal cell growth. "Known colon 
cancer gene" refers to a sequence which has been previously identified as useful in the diagnosis, 

30 treatment, prognosis, or prevention of diseases of the colon. Typically, this means that the known gene is 
expressed at higher levels (i.e., has more abundant transcripts) in diseased or cancerous colon tissue than 
in normal or non-diseased colon or any other tissue. 

"Polynucleotide" refers to a nucleic acid molecule, nucleic acid sequence, oligonucleotide, 
nucleotide, or any fragment thereof. It may be DN A or RNA of genomic or synthetic origin, 

3 
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double-stranded or single-stranded, and combined with carbohydrate, lipids, protein or other materials to 
perform a particular activity or form a useful composition. "Oligonucleotide'* is substantially equivalent 
to the terms amplimer, primer, oligomer, element, and probe. 

"Polypeptide" refers to an amino acid molecule, amino acid sequence, oligopeptide, peptide, or 
5 protein or portions thereof whether naturally occurring or synthetic. 

A "portion" refers to peptide sequence which is preferably at least 5 to about 15 amino acids in 
length, most preferably at least 10 amino acids long, and which retains some biological or immunological 
activity of, for example, a portion of SEQ ID NOs:8 and 9. 

"Sample" is used in its broadest sense. A sample containing nucleic acids may comprise a bodily 
10 fluid; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; genomic DNA, 
RNA, or cDNA in solution or bound to a substrate; a cell; a tissue; a tissue print; and the like. 

"Substantially purified" refers to a nucleic acid or an amino acid sequence that is removed from 
its natural environment and that is isolated or separated, and is at least about 60% free, preferably about 
75% free, and most preferably about 90% free, from other components with which it is naturally present. 
15 "Substrate" refers to any suitable rigid or semi-rigid support to which polynucleotides or 

polypeptides are bound and includes membranes, filters, chips, slides, wafers, fibers, magnetic or 
nonmagnetic beads, gels, capillaries or other tubing, plates, polymers, and microparticles with a variety of 
surface forms including wells, trenches, pins, channels, and pores. 

A 4i variant" refers to a polynucleotide whose sequence diverges from SEQ ID NOs:l-7 or to a 
20 polypeptide who sequence diverges from SEQ ID NOs:8 and 9, respectively. Polynucleotide sequence 
divergence may result from mutational changes such as deletions, additions, and substitutions of one or 
more nucleotides; it may also be introduced to accommodate differences in codon usage. Each of these 
types of changes may occur alone, or in combination, one or more times in a given sequence. Polypeptide 
variants include sequences that possess at least one structural or functional characteristic of SEQ ID 
25 NOs:8 and 9. 

THE INVENTION 

The present invention encompasses a method for identifying biomolecules that are associated 
with a specific disease, regulatory pathway, subcellular compartment, cell type, tissue type, or species. In 
particular, the method identifies genes useful in diagnosis, prognosis, treatment, prevention, and 
30 evaluation of therapies for diseases of the colon including, but not limited, colon cancer, metastatic colon 
cancer, atrophic gastritis, cholecystitis, Crohns disease, irritable bowel syndrome, ulcerative colitis, and 
the like. 

The method entails first identifying polynucleotides that are expressed in a plurality of cDNA 
libraries. The identified polynucleotides include genes of known or unknown function which are known 
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to be expressed in a specific disease process, subcellular compartment, ceil type, tissue type, or species. 
The expression patterns of the genes with known function are compared with those of the genes with 
unknown function to determine whether a specified coexpression probability threshold is met. Through 
this comparison, a subset of the polynucleotides having a high coexpression probability with the known 
5 genes can be identified. The high coexpression probability correlates with a particular coexpression 
probability threshold which is preferably less than 0.001 and more preferably less than 0.00001 . 

The polynucleotides originate from cDNA libraries derived from a variety of sources including, 
but not limited to, eukaryotes such as human, mouse, rat, dog, monkey, plant, and yeast, and prokaryotes 
such as bacteria; and viruses. These polynucleotides can also be selected from a variety of sequence 

10 types including, but not limited to, expressed sequence tags (ESTs), assembled polynucleotide sequences, 
full length gene coding regions, promoters, introns, enhancers, 5* untranslated regions, and 3' untranslated 
regions. To have statistically significant analytical results, the polynucleotides need to be expressed in at 
least three cDNA libraries. 

The cDNA libraries used in the coexpression analysis of the present invention can be obtained 

15 from adrenal gland, biliary tract, bladder, blood cells, blood vessels, bone marrow, brain, bronchus, 

cartilage, chromaffin system, colon, connective tissue, cultured cells, embryonic stem cells, endocrine 
glands, epithelium, esophagus, fetus, ganglia, heart, hypothalamus, immune system, intestine, islets of 
Langerhans, kidney, larynx, liver, lung, lymph, muscles, neurons, ovary, pancreas, penis, peripheral 
nervous system, phagocytes, pituitary, placenta, pleurus, prostate, salivary glands, seminal vesicles, 

20 skeleton, spleen, stomach, testis, thymus, tongue, ureter, uterus, and the like. The number of cDNA 

libraries selected can range from as few as 3 to greater than 1 0,000. Preferably, the number of the cDNA 
libraries is greater than 500. 

In a preferred embodiment, genes are assembled to reflect related sequences, such as assembled' 
sequence fragments derived from a single transcript. Assembly of the polynucleotide sequences can be 

25 performed using sequences of various types including, but not limited to, ESTs, extensions, or shotgun 
sequences. In a most preferred embodiment, the polynucleotide sequences are derived from human 
sequences that have been assembled using the algorithm disclosed in "System and Methods for Analyzing 
Biomolecular Sequences", USSN 09/276,534, filed March 25, 1999, incorporated herein by reference. 
Experimentally, differential expression of the polynucleotides can be evaluated by methods 

30 including, but not limited to, differential display by spatial immobilization or by gel electrophoresis, 
genome mismatch scanning, representational difference analysis, and transcript imaging. Additionally, 
differential expression can be assessed by microarray technology. These methods may be used alone or 
in combination. 

Known colon cancer genes can be selected based on the use of these genes as diagnostic or 
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prognostic markers or as therapeutic targets. Preferably, the known colon cancer genes include carbonic 
anhydrase I, II, and IV, carcinoembryonic antigen family of proteins, colorectal carcinoma tumor- 
associated antigen, down-regulated in adenoma, fatty-acid binding protein, galectin, glutathione 
peroxidase, guanylin, cytokeratin 8 and 20, cadherin, intestinal mucin, and the like. 

The procedure for identifying novel genes that exhibit a statistically significant coexpression 
pattern with known colon cancer genes is as follows. First, the presence or absence of a gene in a cDNA 
library is defined: a gene is present in a cDNA library when at least one cDNA fragment corresponding 
to that gene is detected in a cDNA sample taken from the library, and a gene is absent from a library when 
no corresponding cDNA fragment is detected in the sample. 

Second, the significance of gene coexpression is evaluated using a probability method to measure 
a due-to-chance probability of the coexpression. The probability method can be the Fisher exact test, the 
chi-squared test, or the kappa test. These tests and examples of their applications are well known in the 
art and can be found in standard statistics texts (Agresti (1990) Categorical Data Analysis . John Wiley & 
Sons, New York NY; Rice (1988) Mathematical Statistics and Data Analysis . Duxbury Press, Pacific 
Grove CA). A Bonferroni correction (Rice, supra, page 384) can also be applied in combination with one 
of the probability methods for correcting statistical results of one gene versus multiple other genes. In a 
preferred embodiment, the due-to-chance probability is measured by a Fisher exact test, and the threshold 
of the due-to-chance probability is set preferably to less than 0.001, more preferably to less than 0.00001 . 

To determine whether two genes, A and B, have similar coexpression patterns, occurrence data 
vectors can be generated as illustrated in Table 1 . The presence of a gene occurring at least once in a 
library is indicated by a one, and its absence from the library, by a zero. 

Table 1 . Occurrence data for genes A and B 





Library ! 


Library 2 


Library 3 




Library N 


gene A 


1 


1 


0 




0 


gene B 


1 


0 


1 




0 



For a given pair of genes, the occurrence data in Table 1 can be summarized in a 2 x 2 contingency table. 



Table 2. Contingency table for co-occurrences of genes A and B 



30 





Gene A present 


Gene A absent 


Total 


Gene B present 


8 


2 


10 


Gene B absent 


2 


18 


20 


Total 


10 


20 


30 



6 
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Table 2 presents co-occurrence data for gene A and gene B in a total of 30 libraries. Both gene A 
and gene B occur 10 times in the libraries. Table 2 summarizes and presents: 1 ) the number of times 
gene A and B are both present in a library, 2) the number of times gene A and B are both absent in a 
library, 3) the number of times gene A is present and gene B is absent, and 4) the number of times gene 
5 B is present and gene A is absent. The upper left entry is the number of times the two genes co-occur in a 
library, and the middle right entry is the number of times neither gene occurs in a library. The off 
diagonal entries are the number of times one gene occurs and the other does not. Both A and B are 
present eight times and absent 18 times. Gene A is present and gene B is absent two times; and gene B is 
present and gene A is absent two times. The probability ("p-value") that the above association occurs due 

10 to chance as calculated using a Fisher exact test is 0.0003. Associations are generally considered 
significant if a p-value is less than 0.01 (Agresti, supra : Rice, supra ). 

This method of estimating the probability for coexpression of two genes makes several 
assumptions. The method assumes that the libraries are independent and are identically sampled. 
However, in practical situations, the selected cDNA libraries are not entirely independent, because more 

15 than one library may be obtained from a single subject or tissue. Nor are they entirely identically 

sampled, because different numbers of cDNAs may be sequenced from each library. The number of 
cDNAs sequenced typically ranges from 5,000 to 10,000 cDNAs per library. In addition, because a 
Fisher exact coexpression probability is calculated for each gene versus 41,419 other assembled genes, a 
Bonferroni correction for multiple statistical tests is necessary. ^ 

20 Using the method of the present invention, we have identified seven novel genes that exhibit 

strong association, or coexpression, with known genes that are specific to colon cancer. These known ' 
colon cancer genes include carbonic anhydrase I, II, and IV, carcinoembryonic antigen family of proteins, 
colorectal carcinoma tumor-associated antigen, down-regulated in adenoma, fatty-acid binding proteiiC 
galectin, glutathione peroxidase, guanylin, cytokeratin 8 and 20, cadherin, and intestinal mucin. The 

25 results presented in Table 6 show that the expression of the seven novel genes have direct or indirect 

association with the expression of known colon cancer genes. Therefore, the novel genes can potentially 
be used in diagnosis, treatment, prognosis, or prevention of diseases of the colon or in the evaluation of 
therapies for diseases of the colon. Further, the gene products of the seven novel genes are either 
potential therapeutic proteins or targets of therapeutics against diseases of the colon. 

30 Therefore, in one embodiment, the present invention encompasses a polynucleotide sequence 

comprising the sequence of SEQ ID NOs: 1-7. These seven polynucleotides are shown by the method of 
the present invention to have strong coexpression association with known colon cancer genes and with 
each other. The invention also encompasses a variant of the polynucleotide sequence, its complement, or 
18 consecutive nucleotides of a sequence provided in the above described sequences. Variant 
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polynucleotide sequences typically have at least about 75%, more preferably at least about 85%, and most 
preferably at least about 95% polynucleotide sequence identity to NSEQ. 

NSEQ or the encoded PSEQ may be used to search against the GenBank primate (pri), rodent 
(rod), mammalian (mam), vertebrate (vrtp), and eukaryote (eukp) databases, SwissProt, BLOCKS 
(Bairoch etaj. (1997) Nucleic Acids Res 25:217-221), PFAM, and other databases that contain previously 
identified and annotated motifs, sequences, and gene functions. Methods that search for primary 
sequence patterns with secondary structure gap penalties (Smith etal. (1992) Protein Engineering 5:35- 
51) as well as algorithms such as Basic Local Alignment Search Tool (BLAST; Altschul (1993) J Mol 
Evol 36:290-300; Altschul eyi]. (1990) J Mol Biol 215:403-410), BLOCKS (Henikoff and Henikoff 
(1991) Nucleic Acids Research 19:6565-6572), Hidden Markov Models (HMM; Eddy (1996) Cur Opin 
Str Biol 6:361-365; Sonnhammer etaj. (1997) Proteins 28:405-420), and the like, can be used to 
manipulate and analyze nucleotide and amino acid sequences. These databases, algorithms and other 
methods are well known in the art and are described in Ausubel eLal. (1997; Short Protocols in Molecular 
Biology, John Wiley & Sons, New York NY, unit 7.7) and in Meyers (1995; Molecular Biology and 
15 Biotechnology. Wiley VCH, New York NY, p 856-853). 

Also encompassed by the invention are polynucleotide sequences that are capable of hybridizing 
to SEQ ID NOs: 1-7, and fragments thereof under stringent conditions. Stringent conditions can be 
defined by salt concentration, temperature, and other chemicals and conditions well known in the art. 
Suitable conditions can be selected, for example, by varying the concentrations of salt in the 
prehybridization, hybridization, and wash solutions or by varying the hybridization and wash 
temperatures. With some substrates, the temperature can be decreased by adding formamide to the 
prehybridization and hybridization solutions. 

Hybridization can be performed at low stringency, with buffers such as 5xSSC with 1% sodium 
dodecyl sulfate (SDS) at 60° C, which permits complex formation between two nucleic acid sequences 
that contain some mismatches. Subsequent washes are performed at higher stringency with buffers such 
as 0.2xSSC with 0.1% SDS at either 45° C (medium stringency) or 68° C (high stringency), to maintain 
hybridization of only those complexes that contain completely complementary sequences. Background 
signals can be reduced by the use of detergents such as SDS, Sarcosyl, or Triton X-100, and/or a blocking 
agent, such as salmon sperm DNA. Hybridization methods are described in detail in Ausubel (supra . 
30 units 2.8-2.1 1, 3.18-3.19 and 4-6-4.9) and Sambrook et_aj. (1989; Molecular Cloning. A Laboratory 
Manual, Cold Spring Harbor Press, Plainview NY) 

NSEQ can be extended utilizing a partial nucleotide sequence and employing various PCR-based 
methods known in the art to detect upstream sequences such as promoters and other regulatory elements. 
(See, e.g., Dieffenbach and Dveksler (1995) PCR Primer, a Laboratory Manual . Cold Spring Harbor 



20 
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Press, Piainview NY). Additionally, one may use an XL-PCR kit (PE Biosystems, Foster City CA), 
nested primers, and commercially available cDNA (Life Technologies, Rockville MD) or genomic 
libraries (Clontech, Palo Alto CA) to extend the sequence. For all PCR-based methods, primers may be 
designed using commercially available software, such as OLIGO 4.06 Primer analysis software (National 
5 Biosciences, Plymouth MN) or another appropriate program, to be about 1 8 to 30 nucleotides in length, to 
have a GC content of about 50%, and to form a hybridization complex at temperatures of about 68°C to 
72°C. 

In another aspect of the invention, NSEQ can be cloned in recombinant DNA molecules that 
direct the expression of PSEQ or structural or functional fragments thereof, in appropriate host cells. Due 

10 to the inherent degeneracy of the genetic code, other DNA sequences which encode substantially the same 
or a functionally equivalent amino acid sequence may be produced and used to express the polypeptide 
encoded by NSEQ. The nucleotide sequences of the present invention can be engineered using methods 
generally known in the art in order to alter the nucleotide sequences for a variety of purposes including, 
but not limited to, modification of the cloning, processing, and/or expression of the gene product. DNA 

15 shuffling by random fragmentation and PCR reassembly of gene fragments and synthetic oligonucleotides 
may be used to engineer the nucleotide sequences. For example, oligonucleotide-mediated site-directed 
mutagenesis may be used to introduce mutations that create new restriction sites, alter glycosylation 
patterns, change codon preference, produce splice variants, and so forth. 

In order to express a biologically active protein, NSEQ, or derivatives thereof, may be inserted 

20 into an appropriate expression vector, i.e., a vector which contains the necessary elements for 

transcriptional and translational control of the inserted coding sequence in a particular host. These 
elements include regulatory sequences, such as enhancers, constitutive and inducible promoters, and 5' 
and 3' untranslated regions. Methods which are well known to those skilled in the art may be used to 
construct such expression vectors. These methods include in vitro recombinant DNA techniques, 

25 synthetic techniques, and in vivo genetic recombination. (See, e.g., Sam brook, supra: and Ausubel, 
supra ): 

A variety of expression vector/host cell systems may be utilized to express NSEQ. These include, 
but are not limited to, microorganisms such as bacteria transformed with recombinant bacteriophage, 
plasmid, or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell 
30 systems infected with baculovirus vectors; plant cell systems transformed with viral or bacterial 
expression vectors; or animal cell systems. For long term production of recombinant proteins in 
mammalian systems, stable expression in cell lines is preferred. For example, NSEQ can be transformed 
into cell lines using expression vectors which may contain viral origins of replication and/or endogenous 
expression elements and a selectable or visible marker gene on the same or on a separate vector. The 
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invention is not to be limited by the vector or host cell employed. 

In general, host cells that contain NSEQ and that express PSEQ may be identified by a variety of 
procedures known to those of skill in the art. These procedures include, but are not limited to, 
DNA-DNA or DNA-RNA hybridizations, PCR amplification, and protein bioassay or immunoassay 
5 techniques which include membrane, solution, or chip based technologies for the detection and/or 

quantification of nucleic acid or protein sequences. Immunological methods for detecting and measuring 
the expression of PSEQ using either specific polyclonal or monoclonal antibodies are known in the art. 
Examples of such techniques include enzyme-linked immunosorbent assays (ELISAs), 
radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS). 

10 Host ce,ls transformed with NSEQ may be cultured under conditions suitable for the expression 

and recovery of the protein from cell culture. The protein produced by a transgenic cell may be secreted 
or retained intracellular^ depending on the sequence and/or the vector used. As will be understood by 
those of skill in the art, expression vectors containing NSEQ may be designed to contain signal sequences 
which direct secretion of the protein through a prokaryotic or eukaryotic cell membrane. 

15 ln addition, a host cell strain may be chosen for its ability to modulate expression of the inserted 

sequences or to process the expressed protein in the desired fashion. Such modifications of the 
polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, 
lipidation, and acylation. Post-trans lational processing which cleaves a "prepro" form of the protein may 
also be used to specify protein targeting, folding, and/or activity. Different host cells which have specific 

20 cellular machinery and characteristic mechanisms for post-translational activities (e.g., CHO, HeLa, 

MDCK, HEK293, and WI38) are available from the American Type Culture Collection (ATCC, Manasas 
VA) and may be chosen to ensure the correct modification and processing of the expressed protein. 

In another embodiment of the invention, natural, modified, or recombinant nucleic acid sequences 
are Iigated to a heterologous sequence resulting in translation of a fusion protein containing heterologous 

25 protein moieties in any of the aforementioned host systems. Such heterologous protein moieties facilitate 
purification of fusion proteins using commercially available affinity matrices. Such moieties include, but 
are not limited to, glutathione S-transferase, maltose binding protein, thioredoxin, calmodulin binding 
peptide, 6-His, FLAG, c-myc, hemaglutinin, and monoclonal antibody epitopes. 

In another embodiment, the nucleic acid sequences are synthesized, in-whole or in part, using 

30 chemical or enzymatic methods well known in the art (Caruthers eLal. (1980) Nucl Acids Symp Ser (7) 
215-233; Ausubel, supra). For example, peptide synthesis can be performed using various solid-phase 
techniques (Roberge eLaL (1995) Science 269:202-204), and machines such as the ABI 431 A Peptide 
synthesizer (PE Biosystems) can be used to automate synthesis. If desired, the amino acid sequence may 
be altered during synthesis and/or combined with sequences from other proteins to produce a variant 
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protein. 

In another embodiment, the invention entails a substantially purified polypeptide comprising the 
amino acid sequence of SEQ ID NOs:8 and 9 or fragments thereof. 
DIAGNOSTICS and THERAPEUTICS 
5 The polynucleotide sequences can be used in diagnosis, prognosis, treatment, prevention, and 

evaluation of therapies for diseases of the colon including, but not limited, colon cancer, metastatic colon 
cancer, atrophic gastritis, cholecystitis, Crohns disease, irritable bowel syndrome, ulcerative colitis, and 
the like. 

In one preferred embodiment, the polynucleotide sequences are used for diagnostic purposes to 

10 determine the absence, presence, and excess expression of the protein. The polynucleotides may be at 
least 1 8 nucleotides long and consist of complementary RNA and DNA molecules, branched nucleic 
acids, and/or peptide nucleic acids (PNAs). In one alternative, the polynucleotides are used to detect and 
quantify gene expression in samples in which expression of NSEQ is correlated with disease. In another 
alternative, NSEQ can be used to detect genetic polymorphisms associated with a disease. These 

15 polymorphisms may be detected in the transcript cDNA. 

The specificity of the probe is determined by whether it is made from a unique region, a 
regulatory region, or from a conserved motif. Both probe specificity and the stringency of diagnostic 
hybridization or amplification (maximal, high, intermediate, or low) will determine whether the probe 
identifies only naturally occurring, exactly complementary sequences, allelic variants, or related 

20 sequences. Probes designed to detect related sequences should preferably have at least 75% sequence 
identity to any of the nucleic acid sequences encoding PSEQ. 

Methods for producing hybridization probes include the cloning of nucleic acid sequences into 
vectors for the production of mRNA probes. Such vectors are known in the art, are commercially 
available, and may be used to synthesize RNA probes in vitro bv adding appropriate RNA polymerases 

25 and labeled nucleotides. Hybridization probes may incorporate nucleotides labeled by a variety of 

reporter groups including, but not limited to, radionuclides such as 32 P or 35 S, enzymatic labels such as 
alkaline phosphatase coupled to the probe via avidin/biotin coupling systems, fluorescent labels, and the 
like. The labeled polynucleotide sequences may be used in Southern or northern analysis, dot blot, or 
other membrane-based technologies; in PCR technologies; and in microarrays utilizing samples from 

30 subjects to detect altered PSEQ expression. 

NSEQ can be labeled by standard methods and added to a sample from a subject under conditions 
suitable for the formation and detection of hybridization complexes. After incubation the sample is 
washed, and the signal associated with hybrid complex formation is quantitated and compared with a 
standard value. Standard values are derived from any control sample, typically one that is free of the 
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suspect disease. If the amount of signal in the subject sample is altered in comparison to the standard 
value, then the presence of altered levels of expression in the sample indicates the presence of the disease. 
Qualitative and quantitative methods for comparing the hybridization complexes formed in subject 
samples with previously established standards are well known in the art. 

5 Such assays may also be used to evaluate the efficacy of a particular therapeutic treatment 

regimen in animal studies, in clinical trials, or to monitor the treatment of an individual subject. Once the 
presence of disease is established and a treatment protocol is initiated, hybridization or amplification 
assays can be repeated on a regular basis to determine if the level of expression in the subject begins to 
approximate that which is observed in a healthy subject. The results obtained from successive assays may 

10 be used to show the efficacy of treatment over a period ranging from several days to many years. 

The polynucleotides may be used for the diagnosis of a variety of diseases associated with the 
colon. These include, but are not limited to, colon cancer, metastatic colon cancer, atrophic gastritis, 
cholecystitis, Crohns disease, irritable bowel syndrome, ulcerative colitis, and the like. 

The polynucleotides may also be used as targets in a microarray. The microarray can be used to 

15 monitor the expression patterns of large numbers of genes simultaneously and to identify splice variants, 
mutations, and polymorphisms. Information derived from analyses of the expression patterns may be 
used to determine gene function, to understand the genetic basis of a disease, to diagnose a disease, and to 
develop and monitor the activities of therapeutic agents used to treat a disease. Microarrays may also be 
used to detect genetic diversity, single nucleotide polymorphisms which may characterize a particular 

20 population, at the genome level. 

In yet another alternative, polynucleotides may be used to generate hybridization probes useful in 
mapping the naturally occurring genomic sequence. Fluorescent in situ hybridization (FISH) may be 
correlated with other physical chromosome mapping techniques and genetic map data as described in 
Heinz-Ulrich eLaj. (In: Meyers, supra , pp 965-968). 

25 In another embodiment, antibodies or antibody fragments comprising an antigen binding site that 

speciffcally binds PSEQ may be used for the diagnosis of diseases characterized by the over-or-under 
expression of PSEQ. A variety of protocols for measuring PSEQ, including ELI S As, RIAs, and FACS, 
are well known in the art and provide a basis for diagnosing altered or abnormal levels of expression. 
Standard values for PSEQ expression are established by combining samples taken from healthy subjects, 

30 preferably human, with antibody to PSEQ under conditions suitable for complex formation The amount 
of complex formation may be quantitated by various methods, preferably by photometric means. 
Quantities of PSEQ expressed in disease samples are compared with standard values. Deviation between 
standard and subject values establishes the parameters for diagnosing or monitoring disease. 
Alternatively, one may use competitive drug screening assays in which neutralizing antibodies capable of 
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binding PSEQ specifically compete with a test compound for binding the protein. Antibodies can be used 
to detect the presence of any peptide which shares one or more antigenic determinants with PSEQ. In one 
aspect, the anti-PSEQ antibodies of the present invention can be used for treatment or monitoring 
therapeutic treatment for diseases of the colon, particularly colon cancer. 
5 In another aspect, the NSEQ, or its complement, may be used therapeutically for the purpose of 

expressing mRNA and protein, or conversely to block transcription or translation of the mRNA. 
Expression vectors may be constructed using elements from retroviruses, adenoviruses, herpes or vaccinia 
viruses, or bacterial plasmids, and the like. These vectors may be used for delivery of nucleotide 
sequences to a particular target organ, tissue, or cell population. Methods well known to those skilled in 

10 the art can be used to construct vectors to express nucleic acid sequences or their complements. (See, 

e.g., MauliketaJ. (1997) Molecular Biotechnology. Therapeutic Applications and Strategies . Wiley-Liss, 
New York NY.) Alternatively, NSEQ, or its complement, may be used for somatic cell or stem cell gene 
therapy. Vectors may be introduced in vivo , in vitro , and ex vivo . For ex vivo therapy, vectors are 
introduced into stem cells taken from the subject, and the resulting transgenic cells are clonally 

15 propagated for autologous transplant back into that same subject. Delivery of NSEQ by transfection, 
liposome injections, or polycationic amino polymers may be achieved using methods which are well 
known in the art. (See, e.g., Goldman etaj. (1997) Nature Biotechnology 15:462-466.) Additionally, 
endogenous NSEQ expression may be inactivated using homologous recombination methods which insert 
an inactive gene sequence into the coding region or other appropriate targeted region of NSEQ. (See, e.g. 

20 Thomas etal. ( 1 987) Cell 5 1 :503-5 12.) 

Vectors containing NSEQ can be transformed into a cell or tissue to express a missing protein or 
to replace a nonfunctional protein. Similarly a vector constructed to express the complement of NSEQ -* 
can be transformed into a cell to downregulate the overexpression of PSEQ. Complementary or antisense 
sequences may consist of an oligonucleotide derived from the transcription initiation site; nucleotides 

25 between about positions - 10 and +10 from the ATG are preferred. Similarly, inhibition can be achieved 
using triple helix base-pairing methodology. Triple helix pairing is useful because it causes inhibition of 
the ability of the double helix to open sufficiently for the binding of polymerases, transcription factors, or 
regulatory molecules. Recent therapeutic advances using triplex DNA have been described in the 
literature. (See, e.g., Gee et_a_l. In: Huber and Carr ( 1 994) Molecular and Immunologic Approaches . 

30 Futura Publishing, Mt. KiscoNY, pp 163-177.) 

Ribozymes, enzymatic RNA molecules, may also be used to catalyze the cleavage of mRNA and 
decrease the levels of particular mRNAs, such as those comprising the polynucleotide sequences of the 
invention. (See, e.g., Rossi (1994) Current Biology 4:469-471 .) Ribozymes may cleave mRNA at 
specific cleavage sites. Alternatively, ribozymes may cleave mRNAs at locations dictated by flanking 
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regions that form complementary base pairs with the target mRNA. The construction and production of 
ribozymes is well known in the art and is described in Meyers (supra ). 

RNA molecules may be modified to increase intracellular stability and half-life. Possible 
modifications include, but are not limited to, the addition of flanking sequences at the 5' and/or 3' ends of 
5 the molecule, or the use of phosphorothioate or 2' O-methyl rather than phosphodiester linkages within 
the backbone of the molecule. Alternatively, nontraditional bases such as inosine, queosine, and 
wybutosine, as well as acetyl-, methyl-, thio-, and similarly modified forms of adenine, cytidine, guanine, 
thymine, and uridine which are not as easily recognized by endogenous endonucleases, may be included. 
Further, an antagonist, or an antibody that binds specifically to PSEQ may be administered to a 
10 subject to treat or prevent a disease associated with colon cancer. The antagonist, antibody, or fragment 
may be used directly to inhibit the activity of the protein or indirectly to deliver a therapeutic agent to 
cells or tissues which express the PSEQ. An immunoconjugate comprising a PSEQ binding site of the 
antibody or the antagonist and a therapeutic agent may be administered to a subject in need to treat or 
prevent disease. The therapeutic agent may be a cytotoxic agent selected from a group including, but not 
15 limited to, abrin, ricin, doxorubicin, daunorubicin, taxol, ethidium bromide, mitomycin, etoposide, 
tenoposide, vincristine, vinblastine, colchicine, dihydroxy anthracin dione, actinomycin D, diphteria 
toxin, Pseudomonas exotoxin A and 40, radioisotopes, and glucocorticoid. 

Antibodies to PSEQ may be generated using methods that are well known in the art. Such 
antibodies may include, but are not limited to, polyclonal, monoclonal, chimeric, and single chain 
20 antibodies, Fab fragments, and fragments produced by a Fab expression I ibrary. Neutralizing antibodies, 
such as those which inhibit dimer formation, are especially preferred for therapeutic use. Monoclonal 
antibodies to PSEQ may be prepared using any technique which provides for the production of antibody 
molecules by continuous cell lines in culture. These include, but are not limited to, the hybridoma, the 
human B-cell hybridoma, and the EBV-hybridoma techniques. In addition, techniques developed for the 
25 production of chimeric antibodies can be used. (See, e.g., Pound (1998) Immunochemical Protocols 
Methods Mol Biol, Vol 80). Alternatively, techniques described for the production of single chain 
antibodies may be employed. Antibody fragments which contain specific binding sites for PSEQ may 
also be generated. Various immunoassays may be used to identify antibodies having the desired 
specificity. Numerous protocols for competitive binding or immunoradiometric assays using either 
30 polyclonal or monoclonal antibodies with established specificities are well known in the art. 

Yet further, an agonist of PSEQ may be administered to a subject to treat or prevent a disease 
associated with decreased expression, longevity or activity of PSEQ. 

An additional aspect of the invention relates to the administration of a pharmaceutical or sterile 
composition, in conjunction with a pharmaceutical ly acceptable carrier, for any of the therapeutic 
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applications discussed above. Such pharmaceutical compositions may consist of PSEQ or antibodies, 
mimetics, agonists, antagonists, or inhibitors of the polypeptide. The compositions may be administered 
alone or in combination with at least one other agent, such as a stabilizing compound, which may be 
administered in any sterile, biocompatible pharmaceutical carrier including, but not limited to, saline, 
5 buffered saline, dextrose, and water. The compositions may be administered to a subject alone or in 
combination with other agents, drugs, or hormones. 

The pharmaceutical compositions utilized in this invention may be administered by any number 
of routes including, but not limited to, oral, intravenous, intramuscular, intra-arterial, intramedullary, 
intrathecal, intraventricular, transdermal, subcutaneous, intraperitoneal, intranasal, enteral, topical, 
10 sublingual, or rectal means. 

In addition to the active ingredients, these pharmaceutical compositions may contain suitable 
pharmaceutical ly-acceptable carriers comprising excipients and auxiliaries which facilitate processing of 
the active compounds into preparations which can be used pharmaceutical ly. Further details on 
techniques for formulation and administration may be found in the latest edition o f Remington's 
15 Pharmaceutical Sciences (Maack Publishing, Easton PA). 

For any compound, the therapeutically effective dose can be estimated initially either in cell 
culture assays or in animal models such as mice, rats, rabbits, dogs, or pigs. An animal model may also 
be used to determine the appropriate concentration range and route of administration. Such information * 
can then be used to determine useful doses and routes for administration in humans. - 
20 A therapeutically effective dose refers to that amount of active ingredient which ameliorates the 

symptoms or condition. Therapeutic efficacy and toxicity may be determined by standard pharmaceutical 
procedures in cell cultures or with experimental animals, such as by calculating and contrasting the ED 50 
(the dose therapeutically effective in 50% of the population) and LD 50 (the dose lethal to 50% of the 
population) statistics. Any of the therapeutic compositions described above may be applied to any subject 
25 in need of such therapy, including, but not limited to, mammals such as dogs, cats, cows, horses, rabbits, 
monkeys, and most preferably, humans. 

EXAMPLES 

It is to be understood that this invention is not limited to the particular devices, machines, 
materials and methods described. Although particular embodiments are described, equivalent 
30 embodiments may be used to practice the invention. The described embodiments are not intended to limit 
the scope of the invention which is limited only by the appended claims. The examples below are 
provided to illustrate the subject invention and are not included for the purpose of limiting the invention. 
I cDNA Library Construction 

The COLNTUT16 cDNA library, in which Incyte clone 2790708 was discovered, was 
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constructed from colon tumor tissue obtained from a 60 year-old Caucasian male during a left 
hemicolectomy. Pathology indicated an invasive grade 2 adenocarcinoma, a sessile mass located three 
cm from the distal margin. The tumor extended through the submucosa and superficially into the 
muscularis propria. The margins of resection were free of involvement. One of nine regional lymph 
5 nodes contained metastatic adenocarcinoma. The patient presented with blood in the stool and a change 
in bowel habits. Patient history included thrombophlebitis, inflammatory polyarthropathy, prostatic 
inflammatory disease, and depressive disorder. Previous surgeries included resection of the rectum, a 
vasectomy, and exploration of the spinal canal. Family history included a malignant colon neoplasm in a 
sibling. The COLNNOT08 cDNA library in which Incyte clone 1843578 was discovered is from the 
10 same patient. 

The frozen tissue was homogenized and lysed in TRIZOL reagent (1 gm tissue/10 ml TRIZOL; 
Life Technologies), a monoplastic solution of phenol and guanidine isothiocyanate. using a Polytron 
homogenizer (PT-3000; Brinkmann Instruments, Westbury NY). After a brief incubation on ice, 
chloroform was added (1 :5 v/v), and the lysate was centrifuged. The chloroform layer was removed to a 

15 fresh tube, and the RNA extracted with isopropanol, resuspended in DEPC-treated water, and treated with 
DNase for 25 min at 37°C. The RNA was re-extracted once with acid phenol-chloroform pH 4.7 and 
precipitated using 0.3M sodium acetate and 2.5 volumes ethanol. The mRNA was isolated with the 
OLIGOTEX kit (Qiagen, Valencia CA) and used to construct the cDNA library. 

The mRNA was handled according to the recommended protocols in the SUPERSCRIPT plasm id 

20 system (Life Technologies). The cDNAs were fractionated on a SEPHAROSE CL4B column 

(Amersham Pharmacia Biotech, Piscataway NJ), and those cDNAs exceeding 400 bp were ligated into 
pINCY 1 plasm id (Incyte Pharmaceuticals, Palo Alto CA). The plasmid was subsequently transformed 
into DH5cc competent cells (Life Technologies). 
II Isolation and Sequencing of cDNA Clones 

25 Plasmid DNA was released from the cells and purified using the REAL Prep 96 plasmid kit 

(Qiagen). This kit enabled the simultaneous purification of 96 samples in a 96-welI block using 
multi-channel reagent dispensers. The recommended protocol was employed except for the following 
changes: 1 ) the bacteria were cultured in 1 ml of sterile Terrific Broth (Life Technologies) with 
carbenicillin at 25 mg/L and glycerol at 0.4%; 2) after inoculation, the cultures were incubated for 19 

30 hours; at the end of incubation, the cells were lysed with 0.3 ml of lysis buffer; and 3) following 

isopropanol precipitation, the plasmid DNA pellet was resuspended in 0.1 ml of distilled water, after 
which samples were transferred to a 96-well block for storage at 4° C. 

The cDNAs were prepared using a MICROLAB 2200 (Hamilton, Reno NV) in combination with 
DNA ENGINE thermal cycler (PTC200; MJ Research, Watertown MA). cDN As were sequenced by the 
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method of Sanger eLaJ. (1975, J, Moi. Bioi. 94:441 0 using ABI PRISM 377 DNA sequencing systems 
(PE Biosystems) or MEGABASE 1000 sequencing systems (Molecular Dynamics, Sunnyvale CA). 

Most of the sequences disclosed herein were sequenced using standard ABI protocols and ABI 
kits (Cat. Nos. 79345, 79339, 79340, 79357, 79355; PE Biosystems). The solution volumes were used at 
5 0.25x -1 .Ox concentrations. Some of the sequences disclosed herein were sequenced using solutions and 
dyes from Amersham Pharmacia Biotech. 

III Selection, Assembly, and Characterization of Sequences 

The sequences used for coexpression analysis were assembled from EST sequences, 5* and 3' 
longread sequences, and full length coding sequences. Selected assembled sequences were expressed in 

10 at least three cDNA libraries. 

The assembly process is described as follows, EST sequence chromatograms were processed and 
verified. Quality scores were obtained using PHRED (Ewing etal. (1998) Genome Res 8:175-185; 
Ewing and Green (1998) Genome Res 8:186-194), and edited sequences were loaded into a relational 
database management system (RDBMS). The sequences were clustered using BLAST with a product 

15 score of 50. All clusters of two or more sequences created a bin, and each bin with its resident sequences 
represents one transcribed gene. 

Assembly of the component sequences within each bin was performed using a modification of 
Phrap. a publicly available program for assembling DNA fragments (Green, University of Washington, 
Seattle WA). Bins that showed 82% identity from a local pair-wise alignment between any of the 

20 consensus sequences were merged. 

Bins were annotated by screening the consensus sequence in each bin against public databases, 
such as GBpri and GenPept from NCBI. The annotation process involved a FASTn screen against the 
gbpri database in GenBank. Those hits with a percent identity of greater than or equal to 75% and an 
alignment length of greater than or equal to 100 base pairs were recorded as homolog hits. The residual 

25 unannotated sequences were screened by FASTx against GenPept. Those hits with an E value of less 
than orequal to 1 0* 8 were recorded as homolog hits. 

Sequences were then reclustered using BLASTn and Cross-Match, a program for rapid protein 
and nucleic acid sequence comparison and database search (Green, surjra), sequentially. Any BLAST 
alignment between a sequence and a consensus sequence with a score greater than 1 50 was realigned 

30 using cross-match. The sequence was added to the bin whose consensus sequence gave the highest 

Smith- Waterman score (Smith et al . supra ) amongst local alignments with at least 82% identity. Non- 
matching sequences were moved into new bins, and assembly processes were performed for the new bins. 

IV Coexpression Analyses of Known Colon Cancer Genes 

Fourteen known colon cancer genes were selected to identify novel genes that are closely 
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associated with diseases of the colon. These known genes were carbonic anhydrase I, II, and IV, 
carcinoembryonic antigen family of proteins, colorectal carcinoma tumor-associated antigen, down- 
regulated in adenoma, fatty-acid binding protein, galectin, glutathione peroxidase, guanylin, cytokeratin 8 
and 20, cadherin, and intestinal mucin. The colon cancer genes which were examined in this analysis and 
brief descriptions of their functions are listed in Table 4. 



GENE 



TABLE 4 

DESCRIPTION AND REFERENCES 



10 



15 



DRA 



20 



25 



FABP 



30 



35 



40 



45 



CA 1, II, and IV Carbonic anhydrase I, II, and IV 

Isoenzymes in colorectal mucosa, differentially expressed in colon cancer 
(Mori et al . (1993) Gastroenterology 105:820-6) 
CEA Carcinoembryonic antigen family of proteins 

Cell adhesion glycoprotein, diagnostic marker for colon cancer, prognostic 
for survival from colon cancer (Carpelan-Holmstrom et al . ( 1 996) 
Dis Colon Rectum 39:799-805; Harrison etaj. (1997) J Am Coll 
Surg 185:55-59; Graham era]. (1998) Ann Surg 228:59-63) 
CO-029 CO-029 colorectal carcinoma tumor-associated antigen 

Cell surface glycoprotein (Sela etaj. (1989) Hybridoma 8:481-491; 
Szala etaj. (1990) Proc Natl Acad Sci 87:6833-6837) 
Down-regulated in adenoma (DRA) 

Anion transporter expressed predominantly in colon mucosa, expression 
decreased in colon tumors, marker for progression of colon tumor 
(Schweinfest etaj. (1993) Proc Natl Acad Sci 90:4166-4170; 
Byeon etal. (1996) Oncogene 12:387-396; Antalis etaj. 
(1 998) Clin Cancer Res 4: 1 857- 1 863) 
Fatty-acid binding protein 

Hydrophobic ligand-binding protein expressed in liver and intestines, 
differentially expressed in colon and other cancers (Davidson et al . 

(1993) Lab Invest 68:663-675; Khan (1994) Proc Natl Acad Sci 
91:848-852; Gromova et al. (1998) Int J Oncol 13:379-383) 
Galectin family (Alternate name: IgE-binding protein) 
Modulate cell adhesion, cell proliferation, and cell death, differentially 
expressed in colon cancer including the metastatic phase (Sanjuan etal. 
(1997) Gastroenterology 1 13:1906-15; Bresalier et al. (1998) 
Gastroenterology 1 15:287-296; Perillo et_al. (1998) J Mol Med 
76:402-412) 
Glutathione peroxidase 

Anti-oxidant, differentially expressed in colon cancers 
(Jendryczko eLak (1993) Neoplasma 40:107-109; Bravard et al. 

(1994) Int J Cancer 59:843-7; Beno eta]. (1995) Neoplasma 42:265-9) 
Guanylin 

Regulates chloride transport in epithelial tissues such as colon and shows 
decreased expression in colorectal adenocorcinoma (Cohen et al . (1998) 
Lab Invest 78:101-108) 
Cytokeratin 8 and 20 

Cytoskeleton filaments and serum markers for colon cancer including the 
metastatic phase (Funaki. etal . ( 1 997) Life Sci 60:643-652; 
Nakamori eta]. ( 1 997) Dis Colon Rectum 40: S29-36) 



Galec 



Gpx2 



Guan 



ker 8 and 20 
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Cadher Cadherin family 

Cell adhesion proteins and differentiation markers which are differentially 
expressed in colon and other cancers (Breen et al . (1995) Ann Surg 
Oncol 2:378-385; Eckert et_a_l. ( 1 997) Anticancer Res 1 7:7- 1 2; Kreft, 
5 eta}. (1997) J Cell Biol 136:1 1G9-1 121; Efstathiou eta). (1998) 

Proc Natl Acad Sci 95:3122-3127) 

MUC-2 Intestinal mucin 

Expression decreased in majority of colorectal carcinomas (Ho et al . 
(1996) Oncol Res 8: 53-61; Hanski eta]. (1997) J Pathol 182:385- 
10 391; Hanski eta]. (1997) Lab. Invest. 77:685-95) 

From a total of 41,419 assembled gene sequences, we have identified seven novel genes that 
show strong association with 14 known colon cancer genes. Initially, the degree of association was 
measured by probability values using a cutoff p value less than 0.00001 . The sequences were further 
15 examined to ensure that the genes that passed the probability test had strong association with known colon 
cancer genes. The process was reiterated so that the initial 41,419 genes were reduced to the final seven 
colon disease associated genes. Details of the expression patterns for the 14 known and seven novel 
colon disease genes are presented in Tables 5 and 6. 

Table 5 Co-Expression of the 14 Known Colon Cancer Genes (-log/?) 
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Table 6 Co-Expression of Seven Novel Genes and 14 Known Colon Cancer Genes (-log p) 
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We examined genes that are coexpressed with the 14 known colon cancer genes, and identified 
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seven novel genes that are strongly coexpressed. Each of the seven novel genes is coexpressed with at 
least one of the 14 known genes with a p-value of less than 10e-05. The coexpression of the seven novel 
genes with the 14 known genes are shown in Table 6. The entries in Table 6 are the negative log of the p- 
value (-log/?) for the coexpression of the two genes. The novel genes identified are listed in the table by 
5 their Incyte clone numbers, and the known genes, by their abbreviated names as shown in Example V. 
For convenience, all the genes in the table 5 are assigned an identifying number, 1 to 14. 
V Novel Genes Associated with Colon Diseases 

Using the co-expression analysis method, we have identified seven novel genes that exhibit 
strong association, or co-expression, with 14 known colon cancer genes. 

10 Nucleic acids comprising the consensus sequences of SEQ ID NOs: 1-7 of the present invention 

were first identified from Incyte Clones 1580553, 1843578, 1961467, 2296694, 2516888, 2790708, and 
32335282, respectively, and assembled according to Example III. BLAST and other motif searches were 
performed for SEQ ID NOs: 1-7 according to Example VII. SEQ ID NOs: 1-7 were translated and 
sequence identity was sought via comparison to known sequences. SEQ ID NOs:8 and 9 of the present 

15 invention were encoded by the nucleic acids of SEQ ID Nos:6-8, respectively. SEQ ID Nos:8 and 9 were 
also analyzed using BLAST and other motif search tools as disclosed in Example VI. Analyses of the 
novel genes is as follows. 

SEQ ID NO: 1 (Incyte clone 1 580553) is 219 nucleotides in length and has about 74% identity to 
the nucleic acid sequence of a mouse mucin glycoprotein (g2583092). SEQ ID NO:2 (Incyte clone 

20 2296694) is 252 nucleotides in length and has no known homologs in any of the public databases 

described in this application. SEQ ID NO:3 (Incyte clone 2516888) is 285 nucleotides in length and has 
no known homologs in any of the public databases described in this application. SEQ ID NO:4 (Incyte 
clone 2790708) is 1010 nucleotides in length and about 56% identity to the nucleic acid sequence from 
nucleotide 107789 to nucleotide 108777 of human chromosome 9 (g2564750). SEQ ID NO:5 (Incyte 

25 clone 3235282) is 2616 nucleotides in length and has about 64% identity to the nucleic acid sequence 
encoding a mouse calcium sensitive chloride conductance protein (g3925280) and 70% identity to a 
partial cDNAs of a colon specific gene, CSG5, which is 878 nucleotides long. SEQ ID NO:6 (Incyte 
clone 1843578) is 795 nucleotides in length and has about 64% identity to a nucleic acid sequence 
encoding a mouse calcium sensitive chloride conductance protein (g3925280). SEQ ID NO:7 (Incyte 

30 clone 1961467) is 2225 nucleotides in length and has about 6% identity to human gene signature 

HUMGS07792. SEQ ID NO:8 has 1 1 5 amino acids which are encoded by SEQ ID NO:6 and has no 
known homologs in any of the public databases described in this application. Motif analysis of SEQ ID 
NO:8 shows a potential phosphorylation site at S83. SEQ ID NO:9 has 90 amino acids which are 
encoded by SEQ ID NO:7 and has no known homologs in any of the public databases described in this 
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application. Motif analysis of SEQ ID NO:9 shows five potential phosphorylation sites at T10, T6, T21, 
S66, and S86. 

VI Homology Searching for Colon Disease Genes and Their Encoded Proteins 

The polynucleotide sequences, SEQ ID NOs:l-7, and polypeptide sequences, SEQ ID NOs:8 and 
5 9, were queried against databases derived from sources such as GenBank and SwissProt. These 

databases, which contain previously identified and annotated sequences, were searched for regions of 
similarity using BLAST (AltschuK supra ), BLAST searched for matches and reported only those that 
satisfied the probability thresholds of 1 0 25 or less for nucleotide sequences and 10' 8 or less for 
polypeptide sequences. 

10 Tne polypeptide sequences were also analyzed for known motif patterns using MOTIFS, 

SPSCAN, BLIMPS, and HMM-based protocols. MOTIFS (Genetics Computer Group, Madison WI) 
searches polypeptide sequences for patterns that match those defined in the Prosite Dictionary of Protein 
Sites and Patterns (Bairoch, supra ) and displays the patterns found and their corresponding literature 
abstracts. SPSCAN (Genetics Computer Group) searches for potential signal peptide sequences using a 

15 weighted matrix method (Nielsen etal. (1997) Prot Eng 10:1-6). Hits with a score of 5 or greater were 
considered. BLIMPS uses a weighted matrix analysis algorithm to search for sequence similarity 
between the polypeptide sequences and those contained in BLOCKS, a database consisting of short amino 
acid segments, or blocks of 3-60 amino acids in length, compiled from the PROSITE database (Henikoff, 
supra; Bairoch, supra), and those in PRINTS, a protein fingerprint database based on non-redundant 

20 sequences obtained from sources such as SwissProt, GenBank, PIR, and NRL-3D (Attwood eta]. ( 1 997) 
J. Chem Inf Comput Sci 37:417-424). For the purposes of the present invention, the BLIMPS searches ' 
reported matches with a cutoff score of 1 000 or greater and a cutoff probability value of 1 .0 x 1 0\ 
HMM-based protocols were based on a probabilistic approach and searched for consensus primary - 
structures of gene families in the protein sequences (Eddy, supra : Sonnhammer, supra ). More than 500 

25 known protein families with cutoff scores ranging from 10 to 50 bits were selected for use in this 
invention. 

VII Labeling of Probes and Hybridization Analyses 
Blotting 

Polynucleotide sequences are isolated from a biological source and applied to a solid matrix (a 
30 blot) suitable for standard nucleic acid hybridization protocols by one of the following methods. A 

mixture of target nucleic acids is fractionated by electrophoresis through an 0.7% agarose gel in lx TAE 
[40 mM Tris acetate, 2 mM ethylenediamine tetraacetic acid (EDTA)] running buffer and transferred to a 
nylon membrane by capillary transfer using 20x saline sodium citrate (SSC). Alternatively, the target 
nucleic acids are individually iigated to a vector and inserted into bacterial host cells to form a library. 
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Target nucleic acids are arranged on a blot by one of the following methods. In the first method, bacterial 
cells containing individual clones are robotically picked and arranged on a nylon membrane. The 
membrane is placed on bacterial growth medium, LB agar containing carbenicillin, and incubated at 37°C 
for 16 hours. Bacterial colonies are denatured, neutralized, and digested with proteinase K. Nylon 
5 membranes are exposed to UV irradiation in a STRATALINKER UV-crosslinker (Stratagene, La Jolla 
CA) to cross-link DNA to the membrane. 

In the second method, target nucleic acids are amplified from bacterial vectors by thirty cycles of 
PCR using primers complementary to vector sequences flanking the insert. Amplified target nucleic acids 
are purified using SEPHACRYL-400 (Amersham Pharmacia Biotech). Purified target nucleic acids are 
10 robotically arrayed onto a glass microscope slide. The slide was previously coated with 0.05% 
aminopropyl silane (Sigma-AIdrich, St Louis MO) and cured at 1 10°C. The arrayed glass slide 
(microarray) is exposed to UV irradiation in a STRATALINKER UV-crosslinker (Stratagene). 
Probe Preparation 

cDNA probe sequences are made from mRNA templates. Five micrograms of mRNA is mixed 
15 with 1 ug random primer (Life Technologies), incubated at 70°C for 10 minutes, and lyophilized. The 
lyophilized sample is resuspended in 50 ul of lx first strand buffer (cDNA Synthesis system; Life 
Technologies) containing a dNTP mix, [a- 32 P]dCTP, dithiothreitol, and MMLV reverse transcriptase 
(Stratagene), and incubated at 42°C for 1-2 hours. After incubation, the probe is diluted with 42 ul dH,0, 
heated to 95°C for 3 minutes, and cooled on ice. mRNA in the probe is removed by alkaline degradation. 
20 The probe is neutralized, and degraded mRNA and unincorporated nucleotides are removed using a 
PROBEQUANT G-50 Microcolumn (Amersham Pharmacia Biotech). Probes can be labeled with 
fluorescent markers, Cy3-dCTP or Cy5-dCTP (Amersham Pharmacia Biotech), in place of the 
radionuclide, [ 32 P]dCTP. 
Hybridization 

25 Hybridization is carried out at 65°C in a hybridization buffer containing 0.5 M sodium phosphate 

(pH 7.2), 7% SDS, and 1 mM EDTA. After the blot is incubated in hybridization buffer at 65°C for at 
least 2 hours, the buffer is replaced with 10 ml of fresh buffer containing the probe sequences. After 
incubation at 65°C for 18 hours, the hybridization buffer is removed, and the blot is washed sequentially 
under increasingly stringent conditions, up to 40 mM sodium phosphate, 1% SDS, 1 mM EDTA at 65°C. 

30 To detect signal produced by a radiolabeled probe hybridized on a membrane, the blot is exposed to a 

PHOSPHORIMAGER cassette (Molecular Dynamics), and the image is analyzed using IMAGEQUANT 
data analysis software (Molecular Dynamics). To detect signals produced by a fluorescent probe 
hybridized on a microarray, the blot is examined by confocal laser microscopy, and images are collected 
and analyzed using GEMTOOLS gene expression analysis software (Incyte Pharmaceuticals). 
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VIII Production of Specific Antibodies 

SEQ ID NOs: 8-9, or portions thereof, substantially purified using polyacrylamide gel 
electrophoresis or other purification techniques, is used to immunize rabbits and to produce antibodies 
using standard protocols as described in Pound (supra ). 
5 Alternatively, the amino acid sequence is analyzed using LASERGENE software (DNASTAR, 

Madison WI) to determine regions of high immunogenicity, and a corresponding oligopeptide is 
synthesized and used to raise antibodies by means known to those of skill in the art. Methods for 
selection of appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well 
described in the art. Typically, oligopeptides 15 residues in length are synthesized using an ABI 431 A 

1 0 Peptide synthesizer (PE Biosystems) using Fmoc-chemistry and coupled to keyhole limpet hemocyanin 
(KLH, Sigma-Aldrich) by reaction with N-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 
supra) to increase immunogenicity. Rabbits are immunized with the oligopeptide-KLH complex in 
complete Freund's adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding 
the peptide to plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and reacting with 

15 radio-iodinated goat anti-rabbit IgG. 
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What is claimed is: 

1 . A substantially purified polynucleotide comprising a gene that is coexpressed with one or 
more known colon cancer genes in a plurality of biological samples, wherein each known colon cancer 
gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and IV), 
5 carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen (CO- 
029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), glutathione 
peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), and intestinal 
mucin (muc-2). 
2. RECONSTITUTE 

10 (a) a polynucleotide sequence selected from the group consisting of SEQ ID NOs:I-7; 

(b) a polynucleotide encoding a polypeptide sequence selected from the group consisting of SEQ 
IDNOs:8and 9; 

(c) a polynucleotide sequence having at least 75% identity to the polynucleotide sequence of (a) 

or (b); 

15 ( d ) a polynucleotide sequence which is complementary to the polynucleotide sequence of (a), (b) 

or (c); 

(e) a polynucleotide sequence comprising at least 18 sequential nucleotides of the polynucleotide 
sequence of (a), (b), (c), or (d); and 

(f) a polynucleotide which hybridizes under stringent conditions to the polynucleotide of (a), (b), 
20 (c),(d),or(e). 

3. A substantially purified polypeptide comprising the gene product of a gene that is coexpressed 
with one or more known colon cancer genes in a plurality of biological samples, wherein each known 
colon cancer gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and 
IV) ? carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen 

25 (CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), 

glutathione peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), 
and intestinal mucin (muc-2). 

4. The polypeptide of claim 3, comprising a polypeptide sequence selected from the group 
consisting of: 

30 ( a ) th e polypeptide having the amino acid sequence selected from the group consisting of SEQ 

ID NOs:8 and 9; 

(b) a polypeptide sequence having at least 85% identity to the polypeptide sequence of (a); and 

(c) a polypeptide sequence comprising at least 6 sequential amino acids of the polypeptide 
sequence of (a) or (b). 
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5. An expression vector comprising the polynucleotide of claim 2. 

6. A host cell comprising the expression vector of claim 5. 

7. A pharmaceutical composition comprising the polynucleotide of claim 2 in conjunction with a 
suitable pharmaceutical carrier. 

5 8. A pharmaceutical composition comprising the polypeptide of claim 3 in conjunction with a 

suitable pharmaceutical carrier. 

9. An antibody or antibody fragment comprising an antigen binding site, wherein the antigen 
binding site specifically binds to the polypeptide of claim 4. 

1 0. An immunoconjugate comprising the antigen binding site of the antibody or antibody 
10 fragment of claim 9 joined to a therapeutic agent. 

1 1. A method for diagnosing a disease or condition associated with the altered expression of a 
gene that is coexpressed with one or more known colon cancer genes, wherein each known colon cancer 
gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and IV), 
carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen (CO- 

15 029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), glutathione 

peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), and intestinal 
mucin (muc-2), the method comprising the steps of: 

(a) providing a biological sample; 

(b) hybridizing a polynucleotide of claim 2 to the biological sample under conditions effective to 
20 form one or more hybridization complexes; 

(c) detecting the hybridization complexes; and 

(d) comparing the levels of the hybridization complexes with the level of hybridization 
complexes in a non-diseased sample, wherein the altered level of hybridization complexes compared with 
the level of hybridization complexes of a nondiseased sample correlates with the presence of the disease 

25 or condition. 

* 12. A method for treating or preventing a disease associated with the altered expression of a gene 
that is coexpressed with one or more known colon cancer genes in a subject in need, wherein each known 
colon cancer gene is selected from the group consisting of carbonic anhydrase 1, II, and IV (CA I, II, and 
IV), carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen 
30 (CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), 

glutathione peroxidase (gpx2) ? guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), 
and intestinal mucin (muc-2), the method comprising the step of administering to the subject in need the 
pharmaceutical composition of claim 7 in an amount effective for treating or preventing the disease. 

13. A method for treating or preventing a disease associated with the altered expression of a gene 
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that is coexpressed with one or more known colon cancer genes in a subject in need, wherein each known 
colon cancer gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and 
IV), carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen 
(CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), 
5 glutathione peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), 
and intestinal mucin (muc-2), the method comprising the step of administering to the subject in need the 
pharmaceutical composition of claim 8 in an amount effective for treating or preventing the disease. 

14. A method for treating or preventing a disease associated with the altered expression of a gene 
that is coexpressed with one or more known colon cancer genes in a subject in need, wherein each known 

10 colon cancer gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and 
IV), carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen 
(CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), 
glutathione peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), 
and intestinal mucin (muc-2), the method comprising the step of administering to the subject in need the 

15 antibody or the antibody fragment of claim 9 in an amount effective for treating or preventing the disease. 

15. A method for treating or preventing a disease associated with the altered expression of a gene 
that is coexpressed with one or more known colon cancer genes in a subject in need, wherein each known 
colon cancer gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and 
IV), carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen 

20 (CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), 

glutathione peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), 
and intestinal mucin (muc-2), the method comprising the step of administering to the subject in need the 
immunoconjugate of claim 10 in an amount effective for treating or preventing the disease. 

16. A method for treating or preventing a disease associated with the altered expression of a gene 
25 that is coexpressed with one or more known colon cancer genes in a subject in need, wherein each known 

colon cancer gene is selected from the group consisting of carbonic anhydrase I, II, and IV (CA I, II, and 
IV), carcinoembryonic antigen family of proteins (cea), colorectal carcinoma tumor-associated antigen 
(CO-029), down-regulated in adenoma (dra), fatty-acid binding protein (fabp), galectin (galec), 
glutathione peroxidase (gpx2), guanylin (guan), cytokeratin 8 and 20 (ker 8 and 20), cadherin (cadher), 
30 and intestinal mucin (muc-2), the method comprising the step of administering to the subject in need the 
polynucleotide sequence of claim 2 in an amount effective for treating or preventing the disease. 
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SEQUENCE LISTING 



<110> INCYTE PHARMACEUTICALS, INC. 
Walker, Michael, G. 
Volkmuth, Wayne 
Klingier, Tod, M. 
Lai, Preeti 



<120> GENES ASSOCIATED WITH DISEASES OF THE COLON 



<130> PB-0007 PCT 

<140> To be assigned 
<141> Herewith 

<150> 09/255,381 
<151> 1999-02-22 



<160> 9 



<170> PERL Program 

<210> 1 

<211> 219 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc-feature 

<223> Incyte ID No.: 1580553CB1 



<400> 1 

caccttctat atctctccag gctcaatgga 

aggcctcagt gcaaaatcta ccatccttta 

ctcacctgcc agcatgagaa gctccagcat 

agcagagtca acacacacaa cagcgttccc 



aacaacatta gccagcacta ccacaacacc 60 
cagtagctcc agatcaccag accaaacact 120 
cagtggagaa cccaccagct tgtatagcca 180 
tgccagcac 219 



<210> 2 

<211> 252 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> unsure 
<222> 201 

<223> a or g or c or t, unknown, or other 
<220> 

<221> misc-feature 

<223> Incyte ID No.: 2296694CB1 



<400> 2 

cttttcagaa ccccagatga gagccaatgt 

aactacaata gaagacattt tcactggaat 

aggaaacacc aagaaaagaa tttccaggga 

attttttgtc ttttggataa nctgtttact 
tttattgcct ag 



cagataaagt aagcatagca atgtagcagg 60 
tacaaagcag aattaaaatt atattgtaga 120 
aaatcctctt tgcaggtatt aattcttata 180 
gcctcatctg aactgatccc aggtgaacgg 240 

252 



<210> 3 

<211> 285 

<212> DNA 

<213> Homo sapiens 
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<220> 

<221> misc-f earure 
<223> Incyte ID No. : 

<40C> 3 

gtggatgaca gggttggcca 
ccatacctcc taactggcgc 
gggacacact gctgaacctt 
gaaggaatga ttgtcagggg 
gcggacttac ccctggccat 



516888C31 



ccatggagca cczccaqqcz 
cactccaccc aggaggacrc 
atattgaczt ccaatatgta 
cactgccact gtggggggca 
ggcccagggc cctgctgtta 



gacagagttg agacaagaac 60 
agccagccct tgagcacaca 120 
tctttgctga gagaatgaat 180 
tggccatcct ccaggtcact 240 
ttatc 285 



<210> 4 

<211> 1010 

<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc-f eature 

<223> Incyte ID No.: 2790708C31 

<400> 4 

attttccttt actttttaaa taggttgttg 
gtcactatcc taattcctca gtttatg-tt 
aatacarttg ataacctttg aaatcaatca 
atgcttttat cgttatttct cctgttgaat 
aaaataatct catattacaa tctttctcta 
aaatatctga caatgatatg attatttcct 
agggctattt tctaaaaagc caaagcattg 
agacactcag attcatacat tcaaagggaa 
ttctattgtg ttatcttcct aaattatttt 
gaccctatgt tctgtgtgat aaaaattgcg 
tgccccattt caccattaat caacatacaa 
atacagaaaa aaagatacta taatttcttc 
aacaattatt ttgtgcagca atcttcagat 
actggtggtt atcaatgacc catgtataaa 
atgtcttctt atgtatgatc attagaactg 
atcgagacat tactttcagc agtgaagtaa 
ataaaatata atttattgta ttttgccata 



cctcttatat atttattcta tgatgcaaat 60 
aacagcacac agtggcactt ctatgattca 120 
gaatactgca aaattaattt ttctaaaaca 180 
catcagcaca atrtccaatt gaaaacactt 240 
acagaaccat gatgtaagga cagtgataac 300 
catccargga aattttcctt aataaactaa 360 
cttacaagaa cttttcatca tgacatggat 420 
gtgtcatgta ttccctttca atccacccta 480 
ctatctacat tcttcattct ctttcccatt 540 
tcattggagg ctttttaagg ttaagtatta 600 
cccttctcca tattttgtaa ttcctttcat 660 
aaaatgcttg atattaatga tatatgggaa 720 
aactgggaaa ggccggggaa aaagagagat 780 
ttgtttttat tatgtaagct gtcttcacaa 840 
ttttatatat atatgtaaaa tttccacatt 900 
tcctttttta actgccactt aatgaattca 960 
ataaactatt gatgactatt 1010 



<210> 5 

<211> 2616 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc-feature 

<223> Incyte ID No.: 3235282CB1 

<400> 5 

aaaaatcgaa gcaacaaggt gttccgcagt 
caaggaggca gctgtcttag tagagcatgc 
aaagattgtc aattctttcc tgataaagta 
caaagtattg attctgttgt tgaattttgt 
agcctacaaa acataaagtg caattttaga 
gattttaaaa acaccatacc catggtgaca 
aagatcagtc aaagaattgt gtgcttagtt 
gaccgcctaa atcgaatgaa tcaagcagca 
ggatcctggg tggggatggt tcactttgat 
caaataaaaa gcagtgatga aagaaacaca 
ggaqgaactt ccatctgctc tggaat~aaa 
-cccaactcg atggatccga agtactgctg 
tcttgtattg atgaagtgaa acaaagtggg 
gctgctgatg aagcagtaat agagatgagc 
"cagatgaag ctcagaacaa tggcctcatt 



atctctggta gaaatagagt ttataagtgt 60 
agaattgatt ctacaacaaa actgtatgga 120 
caaacagaaa aagcatccat aatgtttatg 180 
aacgaaaaaa cccataatca agaagctcca 240 
agtacatggg aggtgattag caattctgag 300 
ccacctcctc cacctgtctt ctcattgctg 360 
cttgataagt ctggaagcat ggggggtaag 420 
aaacatttcc tgctgcagac tgttgaaaat 480 
agtactgcca ctattgtaaa taagctaatc 540 
ctcatggcag gattacctac atatcctctg 600 
tatgcatttc aggtgattgg agagctacat 660 
ctgactgatg gggaggataa cactgcaagt 720 
gccattgttc attttattgc tttgggaaga 780 
aagataacag gaggaagtca tttttatgtt 840 
gatgcttttg gggctcttac atcaggaaat 900 
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actgatctct cccagaagtc ccttcagctc gaaagtaagg gattaacact gaatagtaat 960 

gcctggatga acgacaccgt cataattgat agtacagtgg gaaaggacac crtctttctc 1020 

atcacatgga acagcctgcc tcccagtatt tctctctggg atcccagtgg aacaataatg 1080 

gaaaatttca cagtggatgc aacttccaaa atggcctatc tcagtattcc aggaactgca 1140 

aaggtgggca cttgggcata caatcttcaa gccaaagcga acccagaaac a~-aactatt 1200 

acagtaactt ctcgagcagc aaattcttct gtgcctccaa ccacagtgaa tgctaaaatg 1260 

aataaggacg taaacagttt ccccagccca atgattgttt acgcagaaat tctacaagga 1320 

tatgtacctg ttctrggagc caatgtgact gccttcattg aatcacagaa tggacataca 1380 

gaagttttgg aacttttgga taatggtgca ggcgctgatt ctttcaagaa ugatggagtc 1440 

tactccaggt attttacagc atatacagaa aatggcagat atagcttaaa agttcgggct 1500 

catggaggag caaacactgc caggctaaaa ttacggcctc cactgaatag agccgcgtac 1560 

ataccaggct gggtagtgaa cggggaaatt gaagcaaacc cgccaagacc igaaattgat 1620 

gaggaractc agaccacctt ggaggatttc agccgaacag catccggagg tgcatttgtg 1680 

gtatcacaag tcccaagcct tcccttgcct gaccaatacc caccaagtca aatcacagac 1740 

cttgatgcca cagttcatga ggataagatt attcttacat ggacagcacc aggagataat 1800 

tttgatgttg gaaaagttca acgttatatc ataagaataa gtgcaagtat tcttgatcta 1860 

agagacagtt ttgatgatgc tcttcaagta aatactacrg atctgtcacc aaaggaggcc 1920 

aactccaagg aaagctttgc atttaaacca gaaaatatct cagaagaaaa tgcaacccac 1980 

atatttattg ccattaaaag tatagataaa agcaatttga catcaaaagt atccaacatt 2040 

gcacaagtaa ctttgtttat ccctcaagca aatcctgatg acattgatcc tacacctact 2100 

cctactccta ctcctactcc tgataaaagt cataattctg gagttaatat ttctacgctg 2160 

gtattgtctg tgattgggtc tgttgtaatt gttaacttta ttttaagtac caccatttga 2220 

accttaacga agaaaaaaat cttcaagtag acctagaaga gagrtttaaa aaacaaaaca 2280 

argraagraa aggatatttc tgaatcttaa aatrcatccc atgtgtgatc araaactcat 2340 

aaaaataatt ttaagatgtc ggaaaaggat actttgatta aataaaaaca ctcatggata 2400 

tgcaaaaact gtcaagatta aaatttaata gtttcattta tttgttattt tatttgtaag 24 60 

aaatagtgat gaacaaagat cctttttcat actgatacct ggttgtatat tatttgatgc 2520 

aacagttttc tgaaatgata tttcaaattg catcaagaaa ttaaaatcat ctatctgagt 2580 

agrcaaaata caagtaaagg agagcaaata aacatc 2616 



<210> 6 
<211> 795 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc-feature 

<223> Incyte ID No.: 1843578CB1 

<400> 6 

aggagaccca ggggtcccag agctgggctg 

tgatcgaaga gccccgcgcg cactgccgct 

gagaagtcca ctgcttttaa ggccctgcac 

tzgtgaccca acctggagtc ggtcccggtc 

gcatgtgtga ctgtttcagc gactgcggag 

gccttgggtg tcaagttgca gctgatatga 

caatgaggac tctctacagg acccgatatg 

tggcaactct ttgctgtcct cattgtactc 

ggagagccat gcgtactttc taaaaactga 

itcagcagac acctcttcag cttgagttct 

atatgcttaa gtacaactga tggcatgaaa 

argttgtccc tgaacttagc taaatggrgc 

gaatttcctg gcttataaac tttttaaatt 

aaaaaaaaaa aaaaa 



gcgggaggcg taatccggcg gggtgagggt 60 
cacagcccct tcccgagtgc agagcgggca 120 
tgaaaatgca agctcaggcg ccggtggtcg 180 
cggcccccca gaactccaac tggcagacag 240 
tctgtctctg tggcacattt tgtttcccgt 300 
atgaatgctg tctgtgtgga acaagcgtcg 360 
gcatccctgg atctatttgt gatgactata 420 
tttgccaaat caagagagac atcaacagaa 480 
tggtgaaaag ctcttaccga agcaacaaaa 540 
tcaccatctt ttgcaactga aatatgatgg 600 
aaaatcaaat ttttgattta trataaatga 660 
aacttagttt ctccttgctt tcatattatc 720 
acatttgaaa tataaaccaa atgaaatatt 780 

795 



<210> 7 

<211> 2225 

<212> DNA 

<213> Homo sapiens 
<22C> 

<221> misc-feature 

<223> Incyte ID No.: 1961467CB1 
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<40C> 7 

gttcgggtcc tcggaccaca ctctggtttt 
gtagtcatgg ctttaggagc aataggattt 
ctacgacagt tgtacttgca ccaaaacagc 
gaacccggtt gggggaggac gtgagtaggg 
gggagaacat tgtgctttag cccagggagg 
cacuitttcc tgctgccctc ggcaccctgg 
tattgctgcc caccagcgtt aaacgccccc 
ggggcaggga gaggcaggaa tgggaaaatt 
gaatzgtgct cagttctctt tacttcctac 
tgcaacagga catggaacat gcccctccgt 
gtgg-gtctg cagcatcaca ggtcatgcag 
tagatgccca cagcgggtac cagacggaga 
ggaaccccca ggtccccacc ccaaccctct 
tatattgctt tgagagagcc accccagggg 
cacccccatt ttggcacatc tgcaagacac 
caggcttctg tggcctggag ctggagaagg 
taacccttcc caaacccctg ccaaacccac 
cacacataca aagctgagct atccaggaac 
ggagcggagg cagcggggga agaagactgg 
gactggcaca acagctactt tagtgcaatt 
agggagggaa ggcggtcccc aacttccctg 
ggaaagggcc tagcaggagt gggtgagggc 
tctgccctcc caaatgcagt gacagtgtcc 
tggagtcagt accttcaagc aattcaaaga 
tctctgggat ttggtcgctt ctctaggggt 
acccrtccct ctctacctcc cgattcccag 
acctccgccc ttgcccaacc tgggtcaagg 
ggggaagggg ctgctttgtt ccttatccct 
gggarggggg cccatactgg tttgccccag 
gctartttcc tttgcggtgg gaaggggagg 
grgagaaatg gctgagaggg aaggaggaag 
catttagaca aaaacactca tgtgcataag 
cccggcccca atcccacctc tcaggactcc 
acagctgtag aaccgttcac tctggcccca 
ctaggrccag ggagtaagaa ggtgctcggg 
ttttcctttg gttacatatt gaaggcaaag 
gggtgaggaa ggaagagggg ccatggctgg 
ccctc 



ctatgctgtt ttggtgcaag tacaactczz 60 
taanaaacag aacccacccc aaagccatga. 120 
atagaaaacc agagtgtggt gggaggacrc 180 
gcczggaggg cgcagggtca ttaatctgcn 240 
ggaggggtgg ggcaaatgca ccgaggtccc 300 
ggatgcaggc atctgggcac atctgcccc. 360 
gatcccaaca ctagcaccac aggtggttcr 420 
gcttagagaa agattccact agaatccag- 480 
aaccgagtac atgggtcaca gggtggaggg 540 
gccccccaac acacacctgc acacaggarg 600 
ggcatgggga aggggaggtt cacacacaca 660 
acacccctga atatacatag ctgtacatgg 720 
cccctgtctc gctgtccccc gcaggggaac 780 
ctgctctgcc aggcaccctc ccctcccacc 840 
acagcagcga gagtaggcac cctcccttcc 900 
gggtaggaga cttcatcctc catcctcccc 960 
icaagccaga acccaccccc accccccaaa 1020 
acaagggaaa caaggagatt gtccagggcg 1080 
aagcagagac ctcccccctt gtggggggca 1140 
ggagagggtg cccagagtga gaggtggaga 1200 
ggggcaaagt caggcttcca gattccccag 1260 
caaggtggat cctctggtta cccgccaccc 1320 
ccctcacacc taagtgggca acagcagcct 1380 
gcagaccctc cccaccccag cttcacccca 1440 
rgggttggga ggagggagcc cccaaggcac: 1500 
accacngggc ttggtcctca aagattcczc 1560 
ctgcagaagg ctggagccac cacaattaga 1620 
ccttcttaaa aggtagggtt caaactaggc 1680 
gagragggct tctgggctag ggtctgtaag 1740 
taggggatga acactgggta tgggaagtgg 1800 
gggcctcccc gctggagcag tcactggagt 1860 
atacacagtg cgcaaactca gccctgccag 1920 
ttccaagacc ctggaggagg ttctggggar 1980 
tccaccccac ctccagcctc ttctccccr- 2040 
tgggcagaca gtggtggaaa cagtattgag 2100 
gtgagctgga cttacagtca aaacggatag 2160 
ggttggagag ggaggtaggc cctcgtcacc 2220 

2225 



<210> 8 

<211> 115 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> misc-feature 

<223> Incyte ID No.: 1843578CD1 
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<210> 9 
<211> 90 
<212> PRT 

<213> Homo sapiens 
<220> 

<221> misc feature 

<223> Incyte ID No.: 1961467CD1 

<400> 9 

Met Pro Thr Ala Gly Thr Arg Arg Arg 

1 5 
Ser Cys Thr Trp Gly Thr Pro Arg Ser 

20 

Leu Ser Cys Cys Pro Pro Gin Gly Asn 

35 

Pro Pro Gin Gly Leu Leu Cys Gin Ala 

50 

Pro His Phe Gly Thr Ser Ala Arg His 

• 65 

Thr Leu Pro Ser Gin Ala Ser Val Ala 

80 
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This Internationa! Search Report has not been established in respect of certain claims under Article 1 7(2)(a) (or the following reasons: 



t. ["71 Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



2. 



□ 



Although claims 12, 13 and 16 are directed to a method of treatment of the 
human/animal body, the search has been carried out and based on the alleged 
effects of the compound/composition. 

Claims Nos.: 

because they relate to parts of the International Application that do not comply with the pre sen bed requirements to such 
an extent that no meaningful International Search can be earned out, specifically: 



3. Claims Nos.: a „ , 

1 1 because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 

Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 

This International Searching Authority found multiple inventions in this international application, as follows: 



1 . I I As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
' ' searchable claims. 



2. I | As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 



3. | I As only some of the required additional search fees were timely paid by the applicant, thr 
' 1 covers only those claims for which fees were paid, specifically claims Nos.: 



this International Search Report 



4. | y | No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 



partially 1-3, 5-8, 11-13 and 16 
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j [ The additional search fees were accompanied by the applicant's protest. 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



1. Claims: Partially 1-3, 5-8, 11-13 and 16 

Polynucleotide of sequence SEQ ID N0:1, analogs and variants 
thereof, expression vector and host cell comprising the 
same, pharmaceutical composition comprising the 
polynucleotide; polypeptide encoded thereby, pharmaceutical 
composition comprising it and use thereof in a therapeutic 
treatment; use of the polynucleotide for diagnostic and 
treatment 



2. Claims: Partially 1-3, 5-8, 11-13 and 16 

Polynucleotide of sequence SEQ ID N0:2, analogs and variants 
thereof, expression vector and host cell comprising the 
same, pharmaceutical composition comprising the 
polynucleotide; polypeptide encoded thereby, pharmaceutical 
composition comprising it and use thereof in a therapeutic 
treatment; use of the polynucleotide for diagnostic and 
treatment 



3. Claims: Partially 1-3, 5-8, 11-13 and 16 

Polynucleotide of sequence SEQ ID N0:3, analogs and variants 
thereof, expression vector and host cell comprising the 
same, pharmaceutical conposition comprising the 
polynucleotide; polypeptide encoded thereby, pharmaceutical 
composition comprising it and use thereof in a therapeutic 
treatment; use of the polynucleotide for diagnostic and 
treatment 



4. Claims: Partially 1-3, 5-8, 11-13 and 16 

Polynucleotide of sequence SEQ ID N0:4, analogs and variants 
thereof, expression vector and host cell comprising the 
same, pharmaceutical composition comprising the 
polynucleotide; polypeptide encoded thereby, pharmaceutical 
composition comprising it and use thereof in a therapeutic 
treatment; use of the polynucleotide for diagnostic and 
treatment 



5. Claims: Partially 1-3, 5-8, 11-13 and 16 

Polynucleotide of sequence SEQ ID NO: 5, analogs and variants 
thereof, expression vector and host cell comprising the 
same, pharmaceutical composition comprising the 
polynucleotide; polypeptide encoded thereby, pharmaceutical 
composition comprising it and use thereof in a therapeutic 
treatment; use of the polynucleotide for diagnostic and 
treatment 
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FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 



6. Claims: Partially 1-16 

Polynucleotide of sequence SEQ ID N0:6, analogs and variants 
thereof* expression vector and host cell comprising the 
same, pharmaceutical composition comprising the 
polynucleotide; polypeptide of sequence SEQ ID N0:8, 
antibody binding to it, imnunocon jugate comprising an 
antigen binding site thereof, pharmaceutical compositions 
comprising such and use thereof in a therapeutic treatments; 
use of the polynucleotide for diagnostic and treatment 



7. Claims: Partially 1-16 

Polynucleotide of sequence SEQ ID N0:7, analogs and variants 
thereof, expression vector and host cell comprising the 
same, pharmaceutical composition comprising the 
polynucleotide; polypeptide of sequence SEQ ID N0:9, 
antibody binding to it, imnunocon jugate comprising an 
antigen binding site thereof, pharmaceutical compositions 
comprising such and use thereof in a therapeutic treatments; 
use of the polynucleotide for diagnostic and treatment 
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